The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Domain shift is a well-known problem in the medical imaging community. In particular, for endoscopic image analysis where the data can have different modalities the performance of deep learning (DL) methods gets adversely affected. In other words, methods developed on one modality cannot be used for a different modality. However, in real clinical settings, endoscopists switch between modalities for better mucosal visualisation. In this paper, we explore the domain generalisation technique to enable DL methods to be used in such scenarios. To this extend, we propose to use super pixels generated with Simple Linear Iterative Clustering (SLIC) which we refer to as "SUPRA" for SUPeRpixel Augmented method. SUPRA first generates a preliminary segmentation mask making use of our new loss "SLICLoss" that encourages both an accurate and color-consistent segmentation. We demonstrate that SLICLoss when combined with Binary Cross Entropy loss (BCE) can improve the model's generalisability with data that presents significant domain shift. We validate this novel compound loss on a vanilla U-Net using the EndoUDA dataset, which contains images for Barret's Esophagus and polyps from two modalities. We show that our method yields an improvement of nearly 25% in the target domain set compared to the baseline.
translated by 谷歌翻译
最小的侵入性手术是高度操作员,依赖于冗长的程序时间,导致患者疲劳和风险。为了减轻这些风险,实时系统可以通过提供对场景的清晰了解并避免在操作过程中避免错误估计来帮助外科医生导航和跟踪工具。尽管已经朝这个方向做出了几项努力,但缺乏不同的数据集,并且非常动态的场景及其在每个患者中的可变性都需要实现强大的系统的重大障碍。在这项工作中,我们对最新基于机器学习的方法进行了系统评价,包括手术工具定位,细分,跟踪和3D场景感知。此外,我们提出了这些发明方法的当前差距和方向,并在这些方法的临床整合背后提供了合理的理性。
translated by 谷歌翻译
微创手术中的手术工具检测是计算机辅助干预措施的重要组成部分。当前的方法主要是基于有监督的方法,这些方法需要大量的完全标记的数据来培训监督模型,并且由于阶级不平衡问题而患有伪标签偏见。但是,带有边界框注释的大图像数据集通常几乎无法使用。半监督学习(SSL)最近出现了仅使用适度的注释数据训练大型模型的一种手段。除了降低注释成本。 SSL还显示出希望产生更强大和可推广的模型。因此,在本文中,我们在手术工具检测范式中介绍了半监督学习(SSL)框架,该框架旨在通过知识蒸馏方法来减轻培训数据的稀缺和数据失衡。在拟议的工作中,我们培训了一个标有数据的模型,该模型启动了教师学生的联合学习,在该学习中,学生接受了来自未标记数据的教师生成的伪标签的培训。我们提出了一个多级距离,在检测器的利益区域头部具有基于保证金的分类损失函数,以有效地将前景类别与背景区域隔离。我们在M2CAI16-Tool-locations数据集上的结果表明,我们的方法在不同的监督数据设置(1%,2%,5%,注释数据的10%)上的优越性,其中我们的模型可实现8%,12%和27的总体改善在最先进的SSL方法和完全监督的基线上,MAP中的%(在1%标记的数据上)。该代码可在https://github.com/mansoor-at/semi-supervise-surgical-tool-det上获得
translated by 谷歌翻译
胃肠道(GI)癌症的患病率每年令人震惊,导致死亡率大幅上升。内窥镜检测提供了至关重要的诊断支持,但是,上胃肠道中的细微病变很难检测到,并引起大量的错过检测。在这项工作中,我们利用深度学习来开发一个框架,以改善难以检测病变的本地化并最大程度地减少遗漏的检测率。我们提出了一个端到端的学生教师学习设置,其中使用较大数据集的一个班级训练有素的教师模型的班级概率用于惩罚多级学生网络。我们的模型在两种内窥镜疾病检测(EDD2020)挑战和Kvasir-SEG数据集上,在平均平均精度(MAP)方面达到了更高的性能。此外,我们表明,使用这样的学习范式,我们的模型可以推广到看不见的测试集,从而为临床上关键的肿瘤和息肉类别提供更高的APS
translated by 谷歌翻译
炎症性肠病(IBD),尤其是溃疡性结肠炎(UC),由内镜医生分级,该评估是风险分层和治疗监测的基础。目前,内窥镜表征在很大程度上取决于操作员,导致IBD患者有时不良的临床结果。我们专注于广泛使用但需要可靠地鉴定粘膜炎症变化的蛋黄酱内窥镜评分(MES)系统。大多数现有的深度学习分类方法无法检测到这些细粒度的变化,从而使UC的分级成为一项具有挑战性的任务。在这项工作中,我们介绍了一个新颖的贴片级实例组歧视,并使用借口 - 不变的表示学习(PLD-pirl)进行自我监督学习(SSL)。我们的实验表明,与基线监督网络和几种最先进的SSL方法相比,准确性和鲁棒性提高了。与基线(RESNET50)监督分类相比,我们提出的PLD-pirl在Hold-Out测试数据中获得了4.75%的改善,而在看不见的中心测试数据中获得了6.64%的速度,以获得TOP-1的准确性。
translated by 谷歌翻译
结肠镜检查是一种金标准程序,但依赖于高度操作员。已经努力自动化息肉的检测和分割,这是一种癌前前兆,以有效地减少错过率。广泛使用的通过编码器解码器驱动的计算机辅助息肉分段系统在精度方面具有高性能。然而,从各种中心收集的息肉分割数据集可以遵循不同的成像协议,导致数据分布的差异。因此,大多数方法遭受性能下降,并且需要对每个特定数据集进行重新训练。我们通过提出全局多尺度剩余融合网络(GMSRF-Net)来解决这个概括问题。我们所提出的网络在为所有分辨率尺度执行多尺度融合操作时保持高分辨率表示。为了进一步利用比例信息,我们在GMSRF-Net中设计交叉多尺度注意(CMSA)和多尺度特征选择(MSFS)模块。由CMSA和MSFS门控的重复融合操作展示了网络的改进的概括性。在两种不同的息肉分割数据集上进行的实验表明,我们提出的GMSRF-Net优于先前的最先进的方法,在骰子方面,在看不见的CVC-ClinicDB和Unseen KVasir-SEG上的前一流的最先进方法。系数。
translated by 谷歌翻译
精确的仪器分割辅助外科医生更容易导航身体并提高患者安全性。虽然在实时的准确跟踪外科手术仪器在微创的计算机辅助手术中起着至关重要的作用,但这是一个具有挑战性的任务,主要是由于1个复杂的外科环境和2)模型设计,具有最佳的精度和速度。深度学习使我们有机会从大型手术场景环境和在现实世界的情景中学习复杂的环境和这些仪器的展示位置。稳健的医疗仪器分割2019挑战(鲁棒MIS)在不同的临床环境中提供了超过10,000帧的手术工具。在本文中,我们使用轻量级单级实例分段模型,辅助卷积块注意模块,用于实现更快和准确的推理。我们通过数据增强和最佳锚定本地化策略进一步提高了准确性。据我们所知,这是第一个明确关注实时性能和提高准确性的工作。我们在强大的策略中进行了彻底的最高团队表演,对基于区域的公制MI_DSC和距离的公制MI_DSD有超过44%。我们还展示了我们最终方法的不同但竞争变种的实时性能(> 60帧框架)。
translated by 谷歌翻译
在结肠息肉是众所周知的如通过结肠镜检查鉴定的癌症的前体或者有关诊断工作为症状,结肠直肠癌筛查或某些疾病的系统的监视。虽然大部分息肉是良性的,在数量,尺寸和息肉的表面结构是紧密相连的结肠癌的风险。有高的漏检率和不完全去除结肠息肉的存在由于可变性质,困难描绘异常,高复发率和结肠的解剖外形。过去,多种方法已建成自动化息肉检测与分割。然而,大多数方法的关键问题是,他们没有经过严格的大型多中心的专用数据集进行测试。因此,这些方法可能无法推广到不同人群的数据集,因为他们过度拟合到一个特定的人口和内镜监控。在这个意义上,我们已经从整合超过300名患者6个不同的中心策划的数据集。所述数据集包括与由六名高级肠胃验证息肉边界的精确划定3446个注释息肉标签单帧和序列数据。据我们所知,这是由一组计算科学家和专家肠胃的策划最全面的检测和像素级的细分数据集。此数据集已在起源的Endocv2021挑战旨在息肉检测与分割处理可推广的一部分。在本文中,我们提供全面的洞察数据结构和注释策略,标注的质量保证和技术验证我们的扩展EndoCV2021数据集,我们称之为PolypGen。
translated by 谷歌翻译
Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.
translated by 谷歌翻译